Replication Tutorial

Lars Vilhuber
April 2019

Overview

  • High-level overview (15:00)
  • Details of Reproducibility Checks (15:00)
  • A concrete example

Replication and Reproducibility in Social Sciences and Statistics: Context, Concerns, and Concrete Measures

Paris presentation

Details of Reproducibility Checks

A concrete example

We are going to review a fully reproducible example:

  • Step 1: elements of the reproducible analysis
  • Step 2: curation of data for reproducible analysis
  • Step 3: robustness and automation

Requirements

  • web browser
  • some R knowledge (not much)

Let's get started

The Census Bureau put out a blog post with data.

  • I attempted to replicate it
  • The replication itself should be replicable

The Context

We are going to focus on 1 figure

Original

original

Replicated

replicated

Let's start

scan

First problem

When the replicated disappear

Consider the key inputs to this replication:

  • the original article
  • the original data
  • my article replicating the original article
  • the data for my article

stacks

Safeguarding scientific output

The role of journals is to provide a permanent record of scientific knowledge.

  • how reliable is that record?
  • where are journals stored?
  • what if the information is not in a journal?

old library

Safeguarding scientific output

  • journals disappear, as do websites
  • paper journals are stored in libraries
  • e-journals in a system called LOCKSS = Lots of Copies Keep Stuff Safe
  • data should be stored in repositories

tree in library

Solving the first snag

Building a replicable document

Solving dependencies (R)

  • use packrat or checkpoint functionality
  • declare dependencies explicitly [1]
####################################
# global libraries used everywhere #
####################################
# Package lock in - optional
MRAN.snapshot <- "2019-01-01"
options(repos = c(CRAN = paste0("https://mran.revolutionanalytics.com/snapshot/",MRAN.snapshot)))
pkgTest <- function(x)
{
        if (!require(x,character.only = TRUE))
        {
                install.packages(x,dep=TRUE)
                if(!require(x,character.only = TRUE)) stop("Package not found")
        }
        return("OK")
}
global.libraries <- c("dplyr","devtools","rprojroot","tictoc")
results <- sapply(as.list(global.libraries), pkgTest)

Solving dependencies (Stata)

  • install packages locally [1]
  • commit as part of the repository
// Make a path local to the project
// Also see my related config.do at 
//   https://gist.github.com/larsvilhuber/6bcf4ff820285a1f1b9cfff2c81ca02b

local pwd "/c/path/to/project" 
capture mkdir `pwd'/ado

sysdir set PERSONAL `pwd'/ado/personal
sysdir set PLUS     `pwd'/ado/plus
sysdir set SITE `pwd'/ado/site

/* Now install them */
/*--- SSC packages ---*/
foreach pkg in outreg esttab someprog {
  ssc install `pkg'
}